Robot

ABSTRACT

A robot includes a microphone configured to receive sound signals, and one or more controllers configured to determine a reference sound pressure level of background noise based on a sound signal received at a first time point via the microphone, detect occurrence of a sound event based on the reference sound pressure level and a sound pressure level of a sound signal received at a second time point via the microphone, recognize an event corresponding to the detected sound event, and control an operation of the robot based on the recognized event.

CROSS-REFERENCE TO RELATED APPLICATION

Pursuant to 35 U.S.C. § 119 (a), this application claims the benefit ofan earlier filing date and right of priority to InternationalApplication No. PCT/KR2019/000226, filed on Jan. 7, 2019, the contentsof which are hereby incorporated by reference herein in its entirety.

FIELD

The present invention relates to a robot and, more particularly, to arobot for detecting and recognizing a sound event occurring in thevicinity of the robot.

BACKGROUND

A robot is a machine capable of automatically processing or performing agiven task based on abilities thereof, and robot applications aregenerally classified into various fields such as industry, medicine,space and ocean exploration. Recently, communication robots capable ofperforming communication or interaction with a human being throughvoice, gesture, etc. have increased in number.

Such communication robots may include various types of robots such as aguide robot disposed at a specific place to inform users of a variety ofinformation or a home robot provided in the home. In addition, thecommunication robots may include an educational robot for guiding orassisting learning of a learner through interaction with the learner.

Such robots may be implemented to interact with a user or a learnerusing various elements. For example, the robot may include a microphonefor acquiring sound generated in the vicinity of the robot or a camerafor acquiring an image of the vicinity of the robot.

Accordingly, recently, technologies for providing a robot for moreaccurately recognizing various types of events occurring in the vicinityof the robot using an element such as a microphone or a camera andactively performing interaction or control operation based on the resultof recognition have been developed.

SUMMARY

An object of the present invention is to provide a robot capable ofrecognizing an event from sound generated in the surroundings andperforming active interaction based on the recognized event.

Another object of the present invention is to provide a robot capable ofmore smoothly interacting with a user by detecting a direction, in whichsound is generated.

A robot according to an embodiment of the present invention includes amicrophone configured to receive sound signals, and one or morecontrollers configured to determine a reference sound pressure level ofbackground noise based on a sound signal received at a first time pointvia the microphone; detect occurrence of a sound event based on thereference sound pressure level and a sound pressure level of a soundsignal received at a second time point via the microphone, recognize anevent corresponding to the detected sound event, and control anoperation of the robot based on the recognized event.

In one embodiment, the one or more controllers are further configured todetect the sound event when the sound pressure level of the sound signalreceived at the second time point exceeds a threshold sound pressurelevel which is set based on the reference sound pressure level

In some embodiments, the threshold SPL may decrease as the reference SPLincreases.

In some embodiments, the SPL calculator may calculate the SPL of thesound signal in a predetermined operation period, and the backgroundnoise analyzer may set the reference SPL of the background noise basedon each sound signal in at least one continuous operation period.

In some embodiments, the background noise analyzer may acquire a maximumSPL, a minimum SPL and SPL change information from the SPL of each soundsignal in the at least one continuous operation period, and variably setthe reference SPL according to the acquired maximum SPL, minimum SPL andSPL change information.

The sound event detector may provide information on a section, in whichoccurrence of the sound event is detected, and wherein the robot mayfurther include a sound slicing module configured to extract a soundsignal corresponding to the section of the sound signal based on theprovided information.

The robot may further include a memory configured to store a pluralityof signal characteristics corresponding to a plurality of sound events,the sound event recognizer may extract a signal characteristic of thesound signal, compare the extracted signal characteristic with theplurality of signal characteristics, and output a sound eventcorresponding to a signal characteristic matching the extracted signalcharacteristic among the plurality of signal characteristics as a resultof recognition.

In some embodiments, the signal characteristic may include at least oneof a frequency characteristic or a signal change characteristicaccording to elapse of time.

In some embodiments, the sound event recognizer may calculate similaritybetween each of the plurality of signal characteristics and theextracted signal characteristic, and detect a highest signalcharacteristic, the calculated similarity of which is equal to orgreater than reference similarity, among the plurality of signalcharacteristics, as a signal characteristic matching the extractedsignal characteristic.

The robot may further include a display unit, a sound output unit, and acontroller configured to control the display unit and the sound outputunit, and the controller may control at least one of the display unit orthe sound output unit based on a sound event recognized by the soundevent recognizer.

In some embodiments, the memory may store interaction data correspondingto each of a plurality of sound events, and the controller may controlat least one of the display unit or the sound output unit based oninteraction data corresponding to the recognized sound event among theplurality of interaction data.

In some embodiments, the robot may further include an A-weighting filterconfigured to filter the sound signal, and the SPL calculator maymeasure an SPL of the sound signal filtered by the A-weighting filter.

A robot according to an embodiment of the present invention includes aplurality of microphones disposed to be spaced apart from each other, adisplay unit and a camera disposed to be directed in one direction, asound event direction detector configured to acquire a plurality ofsound signals from the plurality of microphones and to detect adirection, in which sound is generated, based on the plurality of soundsignals, and a controller configured to control a rotation mechanism forrotating the robot such that the display unit and the camera aredirected in the detected direction.

In some embodiments, the sound event direction detector may acquiresound pressure levels (SPLs) of the plurality of sound signals, estimatea distance between each of the plurality of microphones and a positionwhere the sound is generated, from the plurality of acquired SPLs, anddetect the direction, in which the sound is generated, based onpositions of the plurality of microphones and the estimated distance.

In some embodiments, the sound event direction detector may detect thedirection, in which the sound is generated, based on a differencebetween times when the plurality of sound signals is acquired.

The controller may control the camera to acquire an image including thedirection, in which the sound is generated, after rotation such that thedisplay unit and the camera are directed in the detected direction, anddetect presence of a user from the image.

The controller may control the display unit and a sound output unitbased on interaction data for interaction with the user, when presenceof the user is detected from the image.

In some embodiments, when presence of a plurality of users is detectedfrom the image, the controller may recognize a user having a largestfacial region among the detected users as a user related to the sound.

The robot may further include a communication unit configured toestablish connection with a terminal of a user, and the controller maytransmit a message or notification related to the sound to the terminalof the user through the communication unit, when absence of the user isdetected from the image.

The rotation mechanism may include a motor provided to rotate at least aportion of the robot about a vertical axis.

An embodiment of the present disclosure includes machine-readablenon-transitory medium having stored thereon machine-executableinstructions for controlling a robot, the instructions comprisingdetermining a reference sound pressure level of background noise basedon a sound signal received at a first time point via a microphone of therobot, detecting occurrence of a sound event based on the referencesound pressure level and a sound pressure level of a second sound signalreceived at a second time point via the microphone, recognizing an eventcorresponding to the detected sound event, and controlling an operationof the robot based on the recognized event.

In another embodiment, the robot may include a plurality of microphonesconfigured to receive sound signals, a display, a camera, and one ormore controllers configured to determine a source direction of a soundsignal received via the plurality of microphones, and control a rotationmechanism of the robot to rotate the robot such that the display and/orthe camera face the source direction of the sound signal.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a perspective view of a robot according to an embodiment ofthe present invention.

FIG. 2 is a block diagram showing the control configuration of a robotaccording to an embodiment of the present invention.

FIG. 3 is a block diagram showing the configuration of a sound eventanalyzer shown in FIG. 2 in detail.

FIG. 4 is a flowchart schematically illustrating operation of detectingand recognizing a sound event of a robot according to an embodiment ofthe present invention.

FIG. 5 is a flowchart illustrating operation of setting a referencesound pressure level (SPL) of background noise from a sound signal at arobot according to an embodiment of the present invention.

FIG. 6 is a graph illustrating an example of a threshold SPL variablyset according to a reference SPL of background noise and an example of adetected SPL, at which occurrence of a sound event is detected accordingto the reference SPL and the threshold SPL.

FIG. 7 is a flowchart illustrating operation of detecting occurrence ofa sound event from a sound signal and operation of recognizing the soundevent when the sound event is detected, at a robot according to anembodiment of the present invention.

FIG. 8 is a view showing an example related to the embodiment of FIGS. 4to 7.

FIG. 9 is a block diagram showing a controller of a robot according toan embodiment of the present invention.

FIG. 10 is a flowchart illustrating operation of detecting a direction,in which sound is generated, of a robot according to an embodiment ofthe present invention and control operation related thereto.

FIGS. 11 to 12 are views showing examples related to the embodiment ofFIG. 10.

FIGS. 13 to 15 are views showing examples of operation of recognizing auser related to sound at a robot when a plurality of users is located ina direction in which sound is generated.

FIG. 16 is a view showing an example of operation performed by a robotwhen a user is not located in a direction in which sound is generated.

DETAILED DESCRIPTION OF THE EMBODIMENTS

Description will now be given in detail according to exemplaryembodiments disclosed herein, with reference to the accompanyingdrawings. For the sake of brief description with reference to thedrawings, the same or equivalent components may be provided with thesame reference numbers, and description thereof will not be repeated. Ingeneral, a suffix such as “module” and “unit” may be used to refer toelements or components. Use of such a suffix herein is merely intendedto facilitate description of the specification, and the suffix itself isnot intended to give any special meaning or function. In the presentdisclosure, that which is well-known to one of ordinary skill in therelevant art has generally been omitted for the sake of brevity. Theaccompanying drawings are used to help easily understand varioustechnical features and it should be understood that the embodimentspresented herein are not limited by the accompanying drawings. As such,the present disclosure should be construed to extend to any alterations,equivalents and substitutes in addition to those which are particularlyset out in the accompanying drawings. Further, while the term “robot” isused in this disclosure, it will be understood by those of ordinaryskill in the art that the disclosure is not limited to devices deemedsolely with a robotic function or purpose, and that the embodiments ofthe present disclosure may be implemented with various other types ofdevices, terminals, and apparatuses, including various configurationsand types of computers, electronic terminals, personal and home devices,appliances, and the like.

It will be understood that although the terms first, second, etc. may beused herein to describe various elements, these elements should not belimited by these terms. These terms are generally only used todistinguish one element from another.

It will be understood that if an element is referred to as being“connected to” or “coupled to” another element, the element can bedirectly connected with the other element or intervening elements mayalso be present. In contrast, if an element is referred to as being“directly connected to” or “directly coupled to” another element, thereare no intervening elements present.

A singular representation may include a plural representation unless itrepresents a definitely different meaning from the context. Terms suchas “include” or “has” are used herein and should be understood that theyare intended to indicate an existence of several components, functionsor steps, disclosed in the specification, and it is also understood thatgreater or fewer components, functions, or steps may likewise beutilized.

FIG. 1 is a perspective view of a robot according to an embodiment ofthe present invention, and FIG. 2 is a block diagram showing the controlconfiguration of a robot according to an embodiment of the presentinvention.

Referring to FIG. 1, a robot 1 may be a communication robot forproviding information to a user or induce a user to perform specificaction through communication or interaction with the user.

For example, the robot 1 may be a home robot located in the home. Such ahome robot may provide a variety of information to the user throughinteraction with the user or perform operation of monitoring an eventoccurring in the home.

In order to perform the above-described operations, the robot 1 mayinclude a camera 132 (see FIG. 2) for acquiring an image of a user orthe vicinity of the robot, at least one microphone 124 (see FIG. 2) foracquiring sound of a user or the vicinity of the robot, a display unit142 for outputting graphics or text, and an input/output unit such as asound output unit 144 (e.g., a speaker) for outputting voice or sound.

The robot 1 may include microphone holes 125 a to 125 c formed in anouter surface of a cover (or a case), in order to smoothly acquire soundof the outside of the robot through the at least one microphone 124implemented therein. Each of the microphone holes 125 a to 125 c may beformed at a position corresponding to any one microphone 124, and themicrophone 124 may communicate with the outside through the microphoneholes 125 a to 125 c. Although three microphone holes 125 a to 125 c areshown as being formed in FIG. 1, the present invention is not limitedthereto. Meanwhile, as described below in FIGS. 9 to 16, the robot 1 mayinclude a plurality (or at least three) of microphones and detect adirection, in which sound is generated, using the plurality ofmicrophones.

The display unit 142 may be disposed to face one surface from the robot1. Hereinafter, the direction of the display unit 142 is defined as afront side of the robot 1. Meanwhile, although the sound output unit 144is shown as being provided at the lower side of the robot 1, theposition of the sound output unit 144 may be variously changed accordingto embodiments.

Although not shown, the robot 1 may further include a movement unit(traveling unit) from moving one position to another position. Forexample, the movement unit may include at least one wheel and a motorfor rotating the wheel.

Hereinafter, an example of the control elements included in the robot 1will be described in detail with reference to FIG. 2.

Referring to FIG. 2, the robot 1 may include a communication unit 11, aninput unit 12, a sensor unit 13, an output unit 14, a rotation mechanism15, a memory 16, a controller 17 and a power supply 18. The elementsshown in FIG. 2 are examples for convenience of description and therobot 1 may include elements more or fewer than those shown in FIG. 2.

The communication unit 11 may include a communication module forconnecting the robot 1 to a server, a mobile terminal, another robot,etc. through a network. For example, the communication unit 11 mayinclude a short-range communication module such as Bluetooth or nearfield communication (NFC), a wireless Internet module such as Wi-Fi, anda mobile communication module such as long term evolution (LTE).

For example, the robot 1 may be connected to the network through anaccess point such as a router. Accordingly, the robot 1 may provide theserver or the mobile terminal with a variety of information acquired bythe input unit 12 or the sensor unit 13 through the network. Inaddition, the robot 1 may receive program data (firmware, etc.) relatedto operation of the robot 1 from the server through the network. In someembodiments, the robot 1 may share a variety of information with otherrobots.

The input unit 12 may include at least one input means for inputting asignal or data corresponding to user's operation or other actions (voiceutterance, etc.) or sound generated in the vicinity of the robot 1. Forexample, the at least one input means may include a physical input meanssuch as a button or a dial, a touch input unit 122 such as a touch pador a touch panel, a microphone 124 for receiving user's voice or soundgenerated in the vicinity of the robot 1, etc. The user may input arequest or a command to the robot 1 by operating the input unit 12.

In some embodiments, the controller 17 of the robot 1 may detectoccurrence of a specific event, by detecting whether a sound componentcorresponding to the specific event is included in the sound signal,based on the sound signal received through the microphone 124. Based onthe result of detection, the controller 17 may perform operation ofrecognizing the specific event.

Hereinafter, for convenience of description, the specific event isdefined as a “sound event”. In addition, “including a sound event in asound signal” or occurrence of a sound event” means that a soundcomponent corresponding to the sound event is included in the soundsignal.

In addition, when the robot 1 includes a plurality of microphones 124,the controller 17 may detect a direction, in which sound is generated,based on the sound signal received from each of the plurality ofmicrophones 124.

The sensor unit 13 may include at least one sensor for sensing a varietyof information on the vicinity of the robot 1. For example, the sensorunit 13 may include various sensors such as a camera 132, a proximitysensor 134 and an illumination sensor 136.

The camera 132 may acquire the image of the vicinity of the robot 1. Insome embodiments, the controller 17 may acquire an image including auser's face through the camera 132, thereby recognizing the user.Alternatively, the controller 17 may acquire the gesture or expressionof the user through the camera 132. In this case, the camera 132 mayfunction as the input unit 12.

The proximity sensor 134 may detect that an object such as a user isapproaching the robot 1. For example, when approaching of the user isdetected by the proximity sensor 134, the controller 17 may output aninitial screen or initial voice through the output unit 14, therebyinducing the user to use the robot 1.

The illumination sensor 136 may detect brightness of a space in whichthe robot 1 is placed. The controller 17 may perform various operationsbased on the result detected by the illumination sensor 136 and/or timezone information.

The output unit 14 may output a variety of information related to theoperation or state of the robot 1 or various services, programs andapplications performed by the robot 1. In addition, the output unit 14may output a variety of messages or information for performinginteraction with the user of the robot 1.

For example, the output unit 14 may include the display unit 142, thesound output unit 144, and a light output unit 146.

The display unit 142 may output the variety of information or messagesin the form of graphics. In some embodiments, the display unit 142 maybe implemented in the form of a touchscreen along with the touch inputunit 122. In this case, the display unit 142 may function not only as anoutput means but also as an input means.

The sound output unit 144 may output the variety of information ormessages in the form of voice or sound. For example, the sound outputunit 144 may include a speaker.

The light output unit 146 may be implemented by a light source such asan LED. The controller 17 may display the state of the robot 1 throughthe light output unit 146. In some embodiments, the light output unit146 may provide the user with the variety of information along with thedisplay unit 142 and/or the sound output unit 144, as an auxiliaryoutput means.

The rotation mechanism 15 may include elements (e.g., a motor, etc.) forrotating the robot 1 about a vertical axis. The controller 17 maycontrol the rotation mechanism 15 to rotate the robot 1, therebychanging the direction of the display unit 142 of the robot 1 (or thedirection of the front surface of the robot 1).

A variety of data such as control data for controlling operation of theelements included in the robot 1 or data for performing operation basedon input acquired through the input unit 12 or information acquiredthrough the sensor unit 13 may be stored in the memory 16.

In addition, program data such as software modules or applicationsexecuted by any one of at least one processor or controller included inthe controller 17 may be stored in the memory 16.

In addition, characteristic information of each of a plurality of soundevents may be stored in the memory 16. The characteristic informationmay include information for identifying sound events, such as frequencycharacteristics of a sound signal or signal change characteristicsaccording to a time. The plurality of sound events may include events(e.g., opening/closing of a door, door-lock operation, baby crying,etc.) occurring in a space (e.g., home) where the robot 1 is placed orvarious events (e.g., robot calling, conversation, etc.) occurring bythe user. When the above events occur, sound corresponding to each ofthe events may be generated, and the robot 1 may acquire a sound signalcorresponding to the above sound through the microphone 124.

The memory 16 may be various storage devices such as a ROM, a RAM, anEPROM, a flash drive, a hard drive, etc. as hardware.

The controller 17 may include at least one processor or controller forcontrolling operation of the robot 1. Specifically, the controller 17may include at least one of a CPU, an application processor (AP), amicrocomputer, an integrated circuit, an application specific integratedcircuit (ASIC), etc.

The controller 17 may perform operations according to variousembodiments of the robot 1, which will be described below with referenceto FIGS. 4 to 16. The at least one processor or controller included inthe controller 17 may perform the above operations using the programdata or algorithm stored in the memory 16.

For example, the controller 17 may include a processor 172, an imagesignal processor (ISP) 174 and a display controller 176.

The processor 172 may control overall operation of the elements includedin the robot 1. The ISP 174 may process an image signal acquired throughthe camera 132 to generate image data. The display controller 176 maycontrol operation of the display unit 142 based on a signal or datareceived from the processor 172. The display unit 142 may outputgraphics or text under control of the display controller 176.

In some embodiments, the ISP 174 and/or the display controller 176 maybe included in the processor 172. In this case, the processor 172 may beimplemented by a unified processor for performing operation of the ISP174 and/or the display controller 176.

The robot 1 according to the embodiment of the present invention mayfurther include a sound event analyzer 200. The sound event analyzer 200and its various elements may be implemented by the one or morecontrollers 17, the one or more processors, or a software module whichmay be executed by the one or more controllers 17, processors, or otherprocessing component. In some embodiments, the sound event analyzer 200may be implemented by a hardware device independently of the controller17, such as a specialized controller or processor, and the like.

The sound event analyzer 200 may detect occurrence of a sound eventbased on the sound signal received through the microphone 124 andrecognize the type of the sound event upon detecting occurrence of thesound event. The sound event analyzer 200 will be described in greaterdetail below with reference to FIG. 3.

Meanwhile, the power supply 18 of the robot 1 may supply power necessaryfor operation of the elements included in the robot 1. For example, thepower supply 18 may include a power connector capable of connecting anexternal wired power cable and a battery for storing and supplying powerto the elements. In some embodiments, the power supply 18 may furtherinclude a wireless charging module for wirelessly receiving power andcharging the battery.

FIG. 3 is a block diagram showing the configuration of the sound eventanalyzer shown in FIG. 2 in detail.

Referring to FIG. 3, the sound event analyzer 200 may include a soundpressure level (SPL) calculation block 210, a background noise analyzer220, and a sound event analyzing block 230.

The SPL calculation block 210 may calculate the SPL of the sound signalreceived from the microphone 124. The SPL is information indicating theintensity of sound corresponding to the sound signal and indicates aratio of the sound pressure of the sound signal to a reference soundpressure in decibel (dB).

For example, the SPL calculation block 210 may include a signalcompensator 212, an A-weighting filter 214, and an SPL calculator 216.

The signal compensator 212 may compensate for the sound signal acquiredthrough the microphone 124 based on the hardware characteristics of themicrophone 124.

In this regard, the microphone 124 may have different sensitivities foreach frequency band according to hardware characteristics. That is, evenif sound having the same intensity (amplitude) in all frequency bands isgenerated, the sound signal acquired by the microphone 124 may havedifferent amplitudes according to frequency bands. That is, the soundsignal may be different from actually generated sound and the soundsignal may be distorted. Accordingly, the signal compensator 212 maycompensate for the sound signal, thereby acquiring the substantiallysame sound signal as actually generated sound (reducing distortion).

The A-weighting filter 214 filters the sound signal compensated for bythe signal compensator 212 based on the human auditory characteristicsand provides the filtered sound signal to the SPL calculator 216.

For example, the A-weighting filter 214 is designed based on a humanaudible frequency and an equal loudness contour, thereby filtering thesound signal according to the human auditory characteristics. As aresult, the robot 1 may detect and recognize a sound event from thesound signal acquired according to the human auditory characteristics,thereby reacting to the sound event similarly to a human.

The A-weighting filter 214 is merely an example of the filter includedin the SPL calculation block 210 and thus various filters designedaccording to the human auditory characteristics may be provided in theSPL calculation block 210 in addition to the A-weighting filter 214.

The SPL calculator 216 may calculate the SPL of the sound signalfiltered by the A-weighting filter 214. The SPL is calculated based onthe ratio of the sound pressure of the sound signal to the referencesound pressure. The method of calculating the SPL is widely known in theart and thus a description thereof will be omitted.

The calculated SPL may be provided to the background noise analyzer 220and the sound event analyzing block 230. Meanwhile, the SPL value inputto the background noise analyzer 220 may be input after being delayed bya delay block 218. That is, the background noise analyzer 220 mayanalyze background noise based on a currently input sound signal andprovide a reference SPL and a threshold SPL set based on the analyzedbackground noise to the sound event analyzing block 230. That is, thebackground noise may be used to analyze whether a sound event isincluded in a next sound signal of the currently input sound signal.

Specifically, the reference SPL and the threshold SPL of the backgroundnoise are set based on the SPL of the currently input sound signal, andthe sound event analyzing block 230 may detect whether the sound eventis included in the next input sound signal based on the SPL of the nextinput sound signal, the reference SPL and the threshold SPL.

The currently input sound signal and the next input sound signal may bedistinguished according to the operation period of the sound eventanalyzer 200. That is, the sound signal may be continuously acquiredthrough the microphone 124, and the sound event analyzer 200 may analyzethe continuously acquired sound signals in units of predeterminedoperation period to detect and recognize the sound event. For example,if it is assumed that the currently input sound signal corresponds to asound signal of a first period, the next input sound signal maycorrespond to a sound signal of a second period which follows the firstperiod.

The background noise analyzer 220 may set the reference SPL of thebackground noise and the threshold SPL based on the reference SPL, basedon the SPL of the sound signal calculated by the SPL calculation block210.

A background noise estimation module 222 included in the backgroundnoise analyzer 220 may set the reference SPL of the background noisebased on the SPL of each of the sound signals of at least one continuousperiod.

The background noise analyzer 220 may acquire information such as amaximum SPL, a minimum SPL, SPL change information (slope change, etc.)from the SPL of each of the sound signals of the at least one continuousperiod and variably set the reference SPL based on the acquiredinformation. The background noise analyzer 220 may include an algorithmfor setting the reference SPL based on the acquired information.

For example, the reference SPL may increase as the maximum SPL and theminimum SPL increase. For example, if the SPL, based on the acquiredinformation, gradually increases, the reference SPL may graduallyincrease and, as the SPL decreases (loudness is reduced), the referenceSPL may rapidly decrease.

A threshold setting module 224 may set the threshold SPL correspondingto the set reference SPL. The threshold SPL may be an element fordetecting a sound event having an intensity greater than the soundintensity of the background noise. That is, the sound event analyzingblock 230 may detect that a sound event has occurred, when a soundsignal having an SPL greater than the reference SPL by at least thethreshold SPL is detected.

Meanwhile, the threshold setting module 224 may variably set thethreshold SPL based on the reference SPL. The threshold SPL may be anSPL value used to determine how much variance is required from thereference SPL before an event is detected. For example, the thresholdsetting module 224 may set the threshold SPL which decreases as thereference SPL increases, but the present invention is not limitedthereto. The threshold SPL may be varied based on a number of otherfactors than only the reference SPL, such as time of day, detectedinformation of the environment or surroundings, and the like. Thethreshold SPL may also be varied based on an input setting input by auser or other system.

The sound event analyzing block 230 may detect occurrence of a soundevent from the received sound signal and recognize the type of thedetected sound event.

For example, the sound event analyzing block 230 may include a soundevent detector 232, a sound slicing module 234, and a sound eventrecognizer 236.

The sound event detector 232 may detect occurrence of a sound eventbased on the calculated SPL of the sound signal and the reference SPL ofthe background noise. For example, the sound event detector 232 maydetect that the sound event has occurred when the calculated SPL isgreater than the reference SPL by the threshold SPL or more. Forexample, the sound event detector 232 may provide the sound slicingmodule 234 with information on a start point and an end point of aperiod (operation period) in which the detected sound event hasoccurred.

The sound slicing module 234 may extract a sound signal of a period inwhich occurrence of the sound event is detected by the sound eventdetector 232. The sound slicing module 234 may extract the sound signalbetween the start point and the end point of the period (operationperiod), in which occurrence of the sound event is detected by the soundevent detector 232, of the sound signal received from the microphone 124and provide the extracted sound signal to the sound event recognizer236.

The sound event recognizer 236 may recognize the sound eventcorresponding to the sound signal based on the signal characteristics ofthe extracted sound signal.

Specifically, the sound event recognizer 236 may compare a plurality ofsignal characteristics stored in the memory 16 or an internal memory ofthe sound event analyzer 200 with the signal characteristics of theextracted sound signal. Each of the plurality of signal characteristicsmay correspond to any one sound event.

The signal characteristics may include unique characteristics related tothe sound event, such as frequency characteristics of the sound event orsignal change characteristics according to elapse of time.

The sound event recognizer 236 may calculate similarity between each ofthe plurality of signal characteristics and the signal characteristic ofthe extracted sound signal. The sound event recognizer 236 may detectthat a highest signal characteristic, the calculated similarity of whichis equal to or greater than reference similarity, among the plurality ofsignal characteristics matches the signal characteristic of theextracted sound signal.

Based on the result of detection, the sound event recognizer 236 mayrecognize a sound event included in the extracted sound signal as asound event corresponding to the matched signal characteristic. Thesound event recognizer 236 may provide the result of recognition to theprocessor 172.

In some embodiments, when a signal characteristic having similarityequal to or greater than the reference similarity is not present as theresult of recognition, the sound event analyzing block 230 may provideinformation indicating that the sound event cannot be recognized to theprocessor 172. The processor 172 may output the result of recognition tothe user through the output unit 14 or the communication unit 11 andacquire information on the sound event included in the sound signal fromthe user. The processor 172 may store the information on the signalcharacteristic acquired from the sound signal and the sound event in thememory 16 based on the acquired information. Accordingly, the soundevent analyzer 200 may recognize the sound event included in the soundsignal when a sound signal having a similar signal characteristic isreceived in the future.

Such a sound event recognizer 236 extracts the signal characteristicfrom the sound signal and compares the extracted signal characteristicswith the plurality of signal characteristics to calculate similarity andthus may have relatively higher load as compared to the other elements.

According to the embodiment of the present invention, the sound eventrecognizer 236 may be activated only when occurrence of a sound event isdetected by the sound event detector 232. Therefore, it is possible toefficiently reduce the load as compared to the case where the soundevent recognizer 236 is continuously activated, which can improve theoverall processing speed or performance of the robot 1.

Hereinafter, embodiments related to operation of detecting andrecognizing the sound event of the robot 1 will be described withreference to FIGS. 4 to 8.

FIG. 4 is a flowchart schematically illustrating detecting andrecognizing a sound event according to an embodiment of the presentinvention.

Referring to FIG. 4, a sound signal may be received corresponding tosound generated in the vicinity of the robot 1 (S100).

The microphone 124 or other audio processing component such as an analogto digital audio converter provided in the robot 1 may convert the soundgenerated in the vicinity of the robot 1 into an electrical signal(sound signal). The controller 17 may receive the converted sound signalfrom the microphone 124.

The robot 1 may calculate the SPL of the received sound signal (S110).

The SPL calculation block 210 of the sound event analyzer 200 maycalculate the SPL from the received sound signal.

As described above with reference to FIG. 3, the SPL calculation block210 may compensate for the sound signal through the signal compensator212. The compensated sound signal may be filtered through theA-weighting filter 214, and the SPL calculator 216 may calculate the SPLof the filtered sound signal.

The calculated SPL may be provided to the background noise analyzer 220and the sound event analyzing block 230.

The robot 1 may detect occurrence of a sound event based on thecalculated SPL and the reference SPL of the background noise (S120).

The sound event detector 232 included in the sound event analyzing block230 may detect whether a sound event is included in the sound signal,that is, whether a sound event has occurred, based on the SPL of thesound signal calculated by the SPL calculation block 210 and thereference SPL set by the background noise analyzer 220.

Upon detecting that the sound event has occurred (YES of S130), therobot 1 may recognize the sound event included in the sound signal(S140).

The sound event recognizer 236 included in the sound event analyzingblock 230 may recognize which sound event is included in the soundsignal, upon detecting that the sound event has occurred.

The sound event recognizer 236 may extract the signal characteristicfrom the sound signal and compare the extracted signal characteristicwith the plurality of signal characteristics stored in the memory 16,thereby calculating similarity.

The sound event recognizer 236 may detect that a highest signalcharacteristic, the calculated similarity of which is equal to orgreater than reference similarity, among the plurality of signalcharacteristics matches the signal characteristic of the extracted soundsignal. The sound event recognizer 236 may recognize that the soundevent included in the extracted sound signal is a sound eventcorresponding to the matched signal characteristic.

The robot 1 may provide interaction corresponding to the recognizedsound event (S150).

The processor 172 may acquire information on the recognized sound eventfrom the sound event analyzer 200.

Meanwhile, interaction data corresponding to each of the plurality ofsound events may be stored in the memory 16. The interaction data mayinclude graphics, text, sound, voice data, etc. output through theoutput unit 14. In addition, the interaction data may include a messageor notification to be transmitted to the mobile terminal of the userthrough the communication unit 11 or information related to a variety ofprocessing operations performed in association with the sound event.

The processor 172 may output the interaction data corresponding to therecognized sound event, thereby providing the interaction. For example,the interaction may include communication with the user based on thesound event, a simple reaction to the sound event, etc. For the purposesof this discussion, various operations are discussed as being performedby components of robot 1, however it will be understood that one or moreor all of these operations may be performed by other aspects of robot 1,such as the one or more controllers or processors. Further, it will beunderstood that one or more or all of these operations may be performedvia other means, including other terminals or apparatuses configured toperform the operations which are in communication with robot 1.

Hereinafter, operation of the robot 1 described above with reference toFIG. 4 will be described in detail with reference to FIGS. 5 to 8.

FIG. 5 is a flowchart illustrating operation of setting a referencesound pressure level (SPL) of background noise from a sound signal at arobot according to an embodiment of the present invention.

Referring to FIG. 5, steps S500 and S510 are substantially equal tosteps S100 and S110 of FIG. 4 and thus a description thereof will beomitted.

The robot 1 may set the reference SPL of the background noise based onthe SPL calculated from the sound signal (S520).

The background noise estimation module 222 included in the backgroundnoise analyzer 220 of the robot 1 may receive the calculated SPL of thesound signal during a predetermined period from the SPL calculator 216.

The background noise estimation module 222 may set the reference SPL ofthe background noise based on the SPL of the sound signal receivedduring at least one continuous period.

As described above in FIG. 3, the background noise estimation module 222may acquire information such as a maximum SPL, a minimum SPL and SPLchange information (slope change, etc.) from the SPL of each soundsignal during the at least one continuous period and variably set thereference SPL based on the acquired information.

The robot 1 may set a threshold SPL for detecting occurrence of thesound event based on the set reference SPL (S530).

The threshold setting module 224 may variably set the threshold SPLbased on the reference SPL. Matching information of the threshold SPLcorresponding to each of the reference SPLs or an equation or algorithmfor setting the threshold SPL using the reference SPL may be stored inthe memory 16. The threshold setting module 224 may set the thresholdSPL based on the matching information or the equation or algorithmstored in the memory 16.

That is, the robot 1 may variably set the threshold SPL according to theintensity of the sound corresponding to the background noise, therebyefficiently detecting and recognizing the sound event even when acombination of various types of sounds is received. For the purposes ofthis discussion, various operations are discussed as being performed bycomponents of robot 1, however it will be understood that one or more orall of these operations may be performed by other aspects of robot 1,such as the one or more controllers or processors. Further, it will beunderstood that one or more or all of these operations may be performedvia other means, including other terminals or apparatuses configured toperform the operations which are in communication with robot 1.

FIG. 6 is a graph illustrating an example of a threshold SPL variablyset according to a reference SPL of background noise and an example of adetected SPL, at which occurrence of a sound event is detected accordingto the reference SPL and the threshold SPL.

The matching information or the equation or the algorithm for settingthe threshold SPL corresponding to each of the reference SPLs of thebackground noise may be stored in the memory 16. The graph based on thematching information, the equation or the algorithm is shown in (a) ofFIG. 6. For example, if the reference SPL is 45 decibel (dB), thethreshold setting module 224 may set the threshold SPL to 15 decibel(dB). The above graph is merely an example, for convenience ofdescription, and the threshold SPL corresponding to the reference SPLmay be changed according to the matching information, the equation orthe algorithm.

Meanwhile, in the graph shown in (a) of FIG. 6, the threshold SPL maydecrease as the reference SPL increases, because there is a limitationin the SPL of the sound event.

The sound event detector 232 may detect whether the sound event isincluded in the sound signal (or whether the sound event has occurred)based on the calculated SPL (detected SPL) of the sound signal and thereference SPL and the threshold SPL.

Referring to the graph shown in (b) of FIG. 6, the sound event detector232 may detect that the sound event has occurred when the calculated SPL(the detected SPL) is greater than the reference SPL by the thresholdSPL. For example, if the reference SPL is 45 decibel, the sound eventdetector 232 may detect that the sound event has occurred when thedetected SPL is 60 decibel or more.

A human may not smoothly detect sound smaller than background noise.Based on this, the robot 1 is implemented to detect only the sound eventlouder than the background noise, thereby detecting the sound event in amanner more similar to the human.

FIG. 7 is a flowchart illustrating detecting occurrence of a sound eventfrom a sound signal and recognizing the sound event when the sound eventis detected, according to an embodiment of the present invention.

Referring to FIG. 7, the robot 1 may compare the calculated SPL of thesound signal with the reference SPL of the background noise (S700).

If the calculated SPL is greater than the reference SPL by the thresholdSPL as the result of comparison (YES of S710), the robot 1 may detectthat the sound event has occurred from the sound signal (S720).

The robot 1 may extract a section, in which the sound event is detected,from the sound signal (S730).

The sound event detector 232 may provide the sound slicing module 234with information on the start point and the end point of the period(operation period) in which the detected sound event has occurred.

The start point may mean an operation period in which occurrence of thesound event is first detected, and the end point may mean an operationperiod in which occurrence of the sound event is last detected, amongthe operation periods in which occurrence of the sound event iscontinuously detected.

The sound slicing module 234 may extract the sound signal between thestart point and the end point based on the result of detection of thesound event detector 232.

The robot 1 may compare the signal characteristic of the extractedperiod with the plurality of signal characteristics corresponding to theplurality of prestored sound events (S740). The robot 1 may recognizethat the sound event corresponding to the matched signal characteristichas occurred (S750), when a signal characteristic matching the signalcharacteristic of the extracted period is present in the plurality ofsignal characteristics.

The sound event recognizer 236 may extract the signal characteristic ofthe extracted sound signal. The signal characteristic may include thefrequency characteristic of the sound event and the signal changecharacteristic according to elapse of time.

The sound event recognizer 236 may compare the plurality of signalcharacteristics stored in the memory 16 with the extracted signalcharacteristic, thereby calculating similarity.

The sound event recognizer 236 may detect that a highest signalcharacteristic, the calculated similarity of which is equal to orgreater than reference similarity, among the plurality of signalcharacteristics matches the signal characteristic of the extracted soundsignal.

Based on the result of detection, the sound event recognizer 236 mayrecognize that the sound event included in the extracted sound signalcorresponds to the matched signal characteristic.

The robot 1 may control the output unit 14 to provide interactioncorresponding to the recognized sound event (S760).

Step S760 is substantially equal to step S150 of FIG. 4 and adescription thereof will be omitted.

For the purposes of this discussion, various operations are discussed asbeing performed by components of robot 1, however it will be understoodthat one or more or all of these operations may be performed by otheraspects of robot 1, such as the one or more controllers or processors.Further, it will be understood that one or more or all of theseoperations may be performed via other means, including other terminalsor apparatuses configured to perform the operations which are incommunication with robot 1.

FIG. 8 is a view showing an example related to the embodiment of FIGS. 4to 7.

Referring to FIG. 8, the robot 1 may be implemented by a home robotplaced in the home.

The robot 1 may acquire a variety of sound generated in the home throughthe microphone 124.

For example, a user 801 may return to the home by opening a front door802, entering the home and closing the front door 802. At this time, asound event 803 corresponding to opening/closing of the front door 802may be generated.

The robot 1 may acquire a sound signal including the sound event 803through the microphone 124.

The sound event analyzer 200 of the robot 1 may calculate the SPL of theacquired sound signal and compare the calculated SPL with the referenceSPL of background noise.

Meanwhile, since the user 801 is not present in the home before thesound event 803 occurs, there may be no sound or very little sound lessthan the sound event 803 generated in the home prior to the sound event803. Accordingly, the reference SPL may be less than the calculated SPLby the threshold SPL.

That is, the sound event analyzer 200 may detect that the calculated SPLis greater than the reference SPL by the threshold SPL and detect thatthe sound event 803 has occurred according to the result of detection.As the sound event 803 is detected, the sound event analyzer 200 mayrecognize the sound event 803 through the sound event recognizer 236.

The sound event recognizer 236 may recognize that the sound event 803 is“opening/closing of the front door”. Accordingly, the robot 1 mayperform interaction with the user 801 using interaction datacorresponding to “opening/closing of the front door”. For example, theinteraction data may include voice data corresponding to “Hi. Welcome”.In this case, the processor 172 may perform interaction with the user,by outputting the voice data through the sound output unit 144.

That is, according to the embodiments shown in FIGS. 4 to 8, the robot 1may acquire sound generated in the vicinity thereof to automaticallyrecognize occurrence of a specific event and intelligently performinteraction according to the recognized event.

In addition, the robot 1 may filter the sound signal acquired throughthe microphone 124 based on the human auditory characteristics or detectthe sound event louder than the background noise, thereby reacting tothe sound event and acting similarly to a human.

Hereinafter, embodiments related to operation of detecting a direction,in which sound is generated, at the robot 1 will be described.

FIG. 9 is a block diagram showing a controller of a robot according toan embodiment of the present invention.

As described above in FIGS. 1 to 2, the robot 1 may include a pluralityof microphones and detect the direction, in which sound is generated,based on sound signals received from the plurality of microphones.

Referring to FIG. 9, the robot 1 may include a sound event directiondetector 900 for receiving the sound signal of each of the plurality ofmicrophones 124 a, 124 b, 124 c and 124 d and detect a direction inwhich sound (or a sound event) is generated.

Meanwhile, although the robot 1 is shown as including four microphones124 a to 124 d in FIG. 9, the robot 1 may include a plurality(preferably at least three) of microphones. The plurality of microphones124 a to 124 d may be disposed to be spaced apart from each other, atleast in a lateral direction. For example, as shown in FIG. 11, thefirst microphone 124 a may be disposed on the front left side of therobot 1 and the second microphone 124 b may be disposed on the frontright side of the robot 1. In addition, the third microphone 124 c maybe disposed on the rear right side of the robot 1 and the fourthmicrophone 124 d may be disposed on the rear left side of the robot 1.That is, as the plurality of microphones 124 a to 124 d are disposed tobe spaced apart from each other, the SPLs of the sound signals acquiredby the microphones 124 a to 124 d may be different and times when thesound signals are acquired by each microphone may be different.

The sound event direction detector 900 may detect the direction, inwhich sound (or a sound event) corresponding to the sound signal isgenerated, based on a difference between the SPLs of the sound signalsreceived from the plurality of microphones 124 a to 124 d or adifference between the times when the sound signals are acquired.

Meanwhile, the robot 1 may further include a user recognizer 1000 forrecognizing a user located around the robot 1 based on an image acquiredthrough the camera 132.

The user recognizer 1000 may recognize the user located around the robot1 using various known face recognition algorithms. At this time, theuser recognized by the user recognizer 1000 may include not only a useralready registered in the robot 1 but also the other persons.

In some embodiments, the sound event analyzer 200, sound event directiondetector 900 and the user recognizer 1000 may correspond to the one ormore controllers 17 or one or more processors 172, or other componentsof robot 1 configured to perform the operations described herein for thesound event analyzer 200, sound event direction detector 900 and userrecognizer 1000.

FIG. 10 is a flowchart illustrating operation of detecting a direction,in which sound is generated, according to an embodiment of the presentinvention and control operation related thereto.

The embodiments of FIGS. 10 to 16 may be performed in parallel with theembodiments of FIGS. 4 to 8. However, in some embodiments, theembodiments of FIGS. 10 to 16 may be performed only when the sound eventis recognized according to the embodiments of FIGS. 4 to 8.

Referring to FIG. 10, the robot 1 may receive a sound signal from eachof the plurality of microphones 124 a to 124 d (S1000) and detect thedirection, in which sound is generated, based on the received soundsignals (S1010).

For example, the sound event direction detector 900 included in therobot 1 may calculate the SPL of each of the sound signals received fromthe plurality of microphones 124 a to 124 d. In this case, the soundevent direction detector 900 may calculate the SPL of each of the soundsignals using the SPL calculation block 210 described above in FIG. 3.Alternatively, the sound event direction detector 900 may directlyinclude an element equal or similar to the SPL calculation block 210.

The sound event direction detector 900 may detect the direction, inwhich sound is generated, based on the calculated SPLs. For example, thesound event direction detector 900 may estimate a distance between eachof the plurality of microphones 124 a to 124 d and a position wheresound is generated from the SPL of the sound signal corresponding toeach of the plurality of microphones 124 a to 124 d. The sound eventdirection detector 900 may detect the direction in which sound isgenerated, using the position of each of the plurality of microphones124 a to 124 d and a triangulation method based on the estimateddistance.

Alternatively, the sound event direction detector 900 may detect thedirection, in which sound is generated, using a difference between timeswhen the sound signals received from the plurality of microphones 124 ato 124 d are acquired.

In some embodiments, step S1010 may be performed only when the soundevent analyzer 200 described above in FIGS. 3 to 8 detects that thesound event is included in the sound signal and the detected sound eventis recognized.

The robot 1 may control the rotation mechanism 15 such that the displayunit 142 is directed in the detected direction (S1020).

Meanwhile, since the SPLs of the sound signals received from themicrophones 124 a to 124 d are not constant, the robot 1 may detect onlythe direction, in which sound is generated, by performing step S1010,but may not accurately detect a position where sound is generated.

Accordingly, the processor 172 may control the rotation mechanism 15such that the front surface, that is, the display unit 142, of the robot1 is directed in the detected direction.

The robot 1 may acquire an image through the camera 132 (S1030), andrecognize presence/absence of a user from the acquired image (S1040).

When the front surface of the robot 1 is directed in the detecteddirection, the camera 132 installed in the robot 1 may be directed inthe detected direction. That is, the camera 132 may be disposed to bedirected in the same direction as the display unit 142.

The processor 172 may acquire an image of the detected direction throughthe camera 132 when the front surface is directed in the detecteddirection, under control of the rotation mechanism 15.

The user recognizer 1000 included in the robot 1 may recognize whether auser (person) is included in the acquired image using various known facerecognition algorithms.

When the user is present as the result of recognition (YES of S1050),the robot 1 may interact with the user based on the sound (S1060).

In contrast, when the user is not present as the result of recognition(NO of S1050), the robot 1 may perform operation corresponding to thesound (S1070).

For example, interaction data corresponding to sound (or a sound event)may be stored in the memory 16. In addition, the interaction data may beclassified into interaction data when a user is present and interactiondata when a user is not present.

As described above, the interaction data may include graphics, text,sound, voice data, etc. output through the output unit 14. In addition,the interaction data may include a message or notification to betransmitted to the mobile terminal of the user through the communicationunit 11 or information related to a variety of processing operationsperformed in association with the sound event.

Accordingly, the processor 172 may perform communication with the userbased on the interaction data when the user is recognized according tothe result of recognition of the user recognizer 1000. In contrast, whenthe user is not recognized, the processor 172 may process a simplereaction (sound output, etc.) to the sound event or transmit a messageor notification related to the sound event to the terminal of the user.For the purposes of this discussion, various operations are discussed asbeing performed by components of robot 1, however it will be understoodthat these components are implemented as one or more other aspects ofrobot 1, or that one or more or all of these operations may be performedby one or more other aspects of robot 1, such as the one or morecontrollers or processors. Further, it will be understood that one ormore or all of these operations may be performed via other means,including other terminals or apparatuses configured to perform theoperations which are in communication with robot 1.

Embodiments related thereto will be described with reference to FIGS. 11o 16.

FIGS. 11 to 12 are views showing examples related to the embodiment ofFIG. 10.

Referring to FIGS. 11 and 12, a user 1100 may utter voice 1110 forcalling the robot 1.

Each of the plurality of microphones 124 a to 124 d provided in therobot 1 may receive sound including the voice 1110 of the user 1100 andacquire sound signals S1 to S4 corresponding to the received sound.

The sound event direction detector 900 of the robot 1 may detect adirection, in which the voice 1110 is generated, based on the soundsignals S1 to S4.

As shown in FIG. 11, when the voice 1110 is uttered on the rear side ofthe robot 1, the SPLs of the first sound signal S1 and the second soundsignal may be lower than the SPLs of the third sound signal and thefourth sound signal S4. In addition, times when the first sound signalS1 and the second sound signal S2 are acquired may be later than timeswhen the third sound signal S3 and the fourth sound signal S4 areacquired.

The sound event direction detector 900 may detect the direction, inwhich the voice 1110 is generated, is the rear side of the robot 1,based on the SPLs of the sound signals S1 to S4 or the times when thesound signals S1 to S4 are acquired.

As shown in FIG. 12, the processor 172 may control the rotationmechanism 15 such that the front surface of the robot 1 is directed inthe detected direction based on the detected direction. According to theresult of control, the display unit 142 of the robot 1 may be directedin the detected direction.

When the front surface of the robot is directed in the detecteddirection under control of the rotation mechanism 15, the processor 172may acquire an image of the detected direction through the camera 132.

The user recognizer 1000 included in the robot 1 may recognize that theuser 1100 is included from the acquired image using various known facerecognition algorithms.

The processor 172 may output a message 1120 for interaction(communication) with the user through the display unit 142 or the soundoutput unit 144 based on the result of recognition.

FIGS. 13 to 15 are views showing examples of operation of recognizing auser related to sound at a robot when a plurality of users is located ina direction in which sound is generated.

In some embodiments, a plurality of users may be located in thedirection in which sound is detected. In general, a user who want tointeract (communicate) with the robot 1 is highly likely to be closer tothe robot 1 than the other users.

Accordingly, when a plurality of users is located in the direction inwhich sound is detected, the robot 1 may recognize a user closest to therobot 1 as a user who utters the sound, and interact with the user.

Specifically, referring to FIGS. 13 and 14, the robot 1 may rotate suchthat the front surface thereof is directed in the direction in which thevoice 1320 is generated.

The robot 1 may acquire an image (indicated as IMAGE in FIG. 14) of thedirection, in which the voice 1320 is generated, through the camera 132,and recognize a user from the image IMAGE through the user recognizer1000.

When a plurality of users 1300 and 1310 is recognized as the result ofrecognition, the processor 172 or the user recognizer 1000 may recognizethe user 1310 closer to the robot 1 between the plurality of users 1300and 1310 as a user who utters the voice 1320.

For example, the processor 172 or the user recognizer 1000 may recognizethe second user 1310 having a larger facial region between the facialregions 1400 and 1410 of the recognized users 1300 and 1310 as the usercloser to the robot 1.

Accordingly, as shown in FIG. 15, the robot 1 may control the rotationmechanism 15 to face the second user 1310 and then output a message 1330for performing interaction with the second user 1310. The second user1310 may perceive that the robot 1 has correctly recognized the seconduser 1310 who utters the voice 1320, by confirming that the robot 1rotates to face the second user.

FIG. 16 is a view showing an example of operation performed by a robotwhen a user is not located in a direction in which sound is generated.

Referring to FIG. 16, the robot 1 may receive sound signalscorresponding to a baby crying 1600 and control the rotation mechanism15 such that the front surface of the robot 1 is directed in adirection, in which the baby is located, based on the received soundsignals.

The processor 172 may acquire an image of the direction, in which thebaby is located, through the camera 132. The user recognizer 1000 mayrecognize whether an already registered user (e.g., parent) is presentfrom the acquired image.

When the user is not present as the result of recognition, the processor172 may perform operation corresponding to the baby crying 1600 based oninteraction data when the user is not present among the interaction datacorresponding to the baby crying 1600.

For example, the interaction data may include sound 1610 to try to calmor soothe the baby and notification (event information EVENT INFO) to betransmitted to the terminal 1700 of the user (such as a parent).

Based on the interaction data, the processor 172 may output sound 1610through the sound output unit 144 and transmit the event informationEVENT INFO to the terminal 1700 of the user through the communicationunit 11. The transmitted event information EVENT INFO may be output onthe screen of the terminal 1700 in the form of notification 1710, suchthat the user recognizes that the baby is crying.

That is, according to the embodiments shown in FIGS. 9 to 16, the robot1 may efficiently detect the direction, in which the sound event hasoccurred, using the plurality of microphones.

In addition, the robot 1 may acquire the image of the detected directionusing the camera 132 and recognize whether a user is present from theimage, thereby more intelligently reacting to the sound event dependingon whether the user is present.

According to the embodiment of the present invention, the robot mayacquire sound generated in the vicinity of the robot and automaticallyrecognize occurrence of a specific event, thereby intelligentlyperforming interaction according to the recognized event.

In addition, the robot may filter a sound signal acquired through themicrophone based on human auditory characteristics or detect a soundevent louder than background noise, thereby reacting to the sound eventand acting similarly to a human.

A sound event recognizer implemented in the robot may be activated onlywhen occurrence of a sound event is detected by a sound event detector.Accordingly, it is possible to efficiently reduce a load as compared tothe case where the sound event recognizer is continuously activated,thereby improving the overall processing speed and performance of therobot.

In addition, the robot can efficiently detect a direction, in which asound event occurs, using a plurality of microphones.

In addition, the robot may acquire the image of a direction detectedusing a camera and recognize presence of a user from the image, therebymore intelligently reacting to a sound event depending on whether a useris present.

The foregoing description is merely illustrative of the technical ideaof the present invention, and various changes and modifications may bemade by those skilled in the art without departing from the essentialcharacteristics of the present invention.

Therefore, the embodiments disclosed in the present invention areintended to illustrate rather than limit the scope of the presentinvention, and the scope of the technical idea of the present inventionis not limited by these embodiments.

The scope of the present invention should be construed according to thefollowing claims, and all technical ideas within the scope ofequivalents should be construed as falling within the scope of thepresent invention.

The various devices, modules, terminals, and the like discussed hereinmay be implemented on a computer by execution of software comprisingmachine instructions read from non-transitory computer-readable medium.Non-transitory computer readable medium may refer to any medium thatparticipates in holding instructions for execution by the processor, orthat stores data for processing by a computer, and comprise allcomputer-readable media, with the sole exception being a transitory,propagating signal. Such a non-transitory computer readable medium mayinclude, but is not limited to, non-volatile media, volatile media, andtemporary storage media (e.g., cache memory). Non-volatile media mayinclude optical or magnetic disks, such as an additional storage device.Volatile media may include dynamic memory, such as main memory. Commonforms of non-transitory computer-readable media may include, forexample, a hard disk, a floppy disk, magnetic tape, or any othermagnetic medium, a CD-ROM, DVD, Blu-ray or other optical medium, RAM,PROM, EPROM, FLASH-EPROM, any other memory card, chip, or cartridge, orany other memory medium from which a computer can read.

In certain embodiments, several hardware aspects may be implementedusing a single computer, terminal, or apparatus, in other embodimentsmultiple computers, input/output systems and hardware may be used toimplement the system. For a software implementation, certain embodimentsdescribed herein may be implemented with separate software modules, suchas procedures and functions, each of which perform one or more of thefunctions and operations described herein. The software codes can beimplemented with a software application written in any suitableprogramming language and may be stored in memory and executed by acontroller or processor.

The foregoing disclosed embodiments and features are merely exemplaryand are not to be construed as limiting the present invention. Thepresent teachings can be readily applied to other types of apparatusesand processes. The description of such embodiments is intended to beillustrative, and not to limit the scope of the claims. Manyalternatives, modifications, and variations will be apparent to thoseskilled in the art.

What is claimed is:
 1. A robot comprising: a microphone configured toreceive sound signals; and one or more controllers configured to:determine a reference sound pressure level of background noise based ona sound signal received at a first time point via the microphone; detectoccurrence of a sound event based on the reference sound pressure leveland a sound pressure level of a sound signal received at a second timepoint via the microphone; recognize an event corresponding to thedetected sound event; and control an operation of the robot based on therecognized event.
 2. The robot of claim 1, wherein the one or morecontrollers are further configured to detect the sound event when thesound pressure level of the sound signal received at the second timepoint exceeds a threshold sound pressure level which is set based on thereference sound pressure level.
 3. The robot of claim 2, wherein thethreshold sound pressure level decreases as the reference sound pressurelevel increases.
 4. The robot of claim 1, wherein the sound signalreceived at the first time point and the sound signal received at thesecond time point are received within a predetermined operation periodof time.
 5. The robot of claim 4, wherein the one or more controllersare further configured to determine a maximum sound pressure level, aminimum sound pressure level, and information on changes of soundpressure during the predetermined operation period of time, and whereinthe reference sound pressure level is varied based on the determinedmaximum sound pressure level, the minimum sound pressure level, and theinformation on changes of sound pressure.
 6. The robot of claim 1,wherein the one or more controllers are further configured to identify aparticular section of the operation period of time and extract an eventsound signal corresponding to the particular section based on theidentification.
 7. The robot of claim 1, further comprising a memoryconfigured to store sound information of a plurality of events, whereinthe one or more controllers are further configured to: extract a signalcharacteristic of the extracted event sound signal; compare theextracted signal characteristic with the stored sound information of theplurality of events; and output a response corresponding to a recognizedevent of the extracted signal characteristic.
 8. The robot of claim 7,wherein the extracted signal characteristic includes at least afrequency characteristic or a signal change characteristic according toa lapse of time.
 9. The robot of claim 7, wherein the one or morecontrollers are further configured to: determine a similarity valuebetween the extracted signal characteristic of the extracted event soundsignal and each of the stored sound information of the plurality ofevents; wherein the recognized event corresponds to a highest determinedsimilarity value among the stored sound information of the plurality ofevents which have determined similarity values greater than or equal toa reference similarity value.
 10. The robot of claim 1, furthercomprising a display and a sound output unit, wherein the operation ofthe robot comprises outputting information via the display or the soundoutput unit based on the recognized event.
 11. A machine-readablenon-transitory medium having stored thereon machine-executableinstructions for controlling a robot, the instructions comprising:determining a reference sound pressure level of background noise basedon a sound signal received at a first time point via a microphone of therobot; detecting occurrence of a sound event based on the referencesound pressure level and a sound pressure level of a second sound signalreceived at a second time point via the microphone; recognizing an eventcorresponding to the detected sound event; and controlling an operationof the robot based on the recognized event.
 12. The machine-readablenon-transitory medium of claim 11 further having stored thereonmachine-executable instructions for: detecting the sound event when thesound pressure level of the sound signal received at the second timepoint exceeds a threshold sound pressure level which is set based on thereference sound pressure level.
 13. The machine-readable non-transitorymedium of claim 12 wherein the threshold sound pressure level decreasesas the reference sound pressure level increases.
 14. Themachine-readable non-transitory medium of claim 11 wherein the soundsignal received at the first time point and the sound signal received atthe second time point are received within a predetermined operationperiod of time.
 15. The machine-readable non-transitory medium of claim14 further having stored thereon machine-executable instructions for:determining a maximum sound pressure level, a minimum sound pressurelevel, and information on changes of sound pressure during thepredetermined operation period of time, and wherein the reference soundpressure level is varied based on the determined maximum sound pressurelevel, the minimum sound pressure level, and the information on changesof sound pressure.
 16. The machine-readable non-transitory medium ofclaim 11 further having stored thereon machine-executable instructionsfor: identifying a particular section of the operation period of timeand extracting an event sound signal corresponding to the particularsection based on the identification.
 17. The machine-readablenon-transitory medium of claim 11 further having stored thereonmachine-executable instructions for: extracting a signal characteristicof the extracted event sound signal; comparing the extracted signalcharacteristic with the stored sound information of the plurality ofevents; and outputting a response corresponding to a recognized event ofthe extracted signal characteristic.
 18. The machine-readablenon-transitory medium of claim 17 wherein the extracted signalcharacteristic includes at least a frequency characteristic or a signalchange characteristic according to a lapse of time.
 19. Themachine-readable non-transitory medium of claim 17 further having storedthereon machine-executable instructions for: determining a similarityvalue between the extracted signal characteristic of the extracted eventsound signal and each of the stored sound information of the pluralityof events; wherein the recognized event corresponds to a highestdetermined similarity value among the stored sound information of theplurality of events which have determined similarity values greater thanor equal to a reference similarity value.
 20. The machine-readablenon-transitory medium of claim 11 wherein the operation of the robotcomprises outputting information via a display or a sound output unit ofthe robot based on the recognized event.
 21. A robot comprising: aplurality of microphones configured to receive sound signals; a display;a camera; and one or more controllers configured to: determine a sourcedirection of a sound signal received via the plurality of microphones;and control a rotation mechanism of the robot to rotate the robot suchthat the display and the camera face the source direction of the soundsignal.
 22. The robot of claim 21, where the one or more controllers arefurther configured to: determine one or more sound pressure levels ofthe sound signal received via the plurality of microphones; estimate adistance between a position of a source of the sound signal and each ofthe plurality of microphones based on the determined one or more soundpressure levels, wherein the source direction is determined based on theestimated distances.
 23. The robot of claim 21, wherein the sourcedirection is determined based on differences in time at which each ofthe plurality of microphones receives the sound signal.
 24. The robot ofclaim 21, wherein the one or more controllers are further configured to:control the camera to capture an image when facing the source direction;and detecting presence of a user based on the captured image.
 25. Therobot of claim 24, wherein the one or more controllers are furtherconfigured to control the robot according to interaction informationwhen presence of the user is detected based on the captured image. 26.The robot of claim 24, wherein the one or more controllers are furtherconfigured to identify a specific person as the speaker of the soundsignal wherein a face of the specific person appears larger than anyother face in the captured image.
 27. The robot of claim 24 furthercomprising a communication unit configured to communicate with anexternal terminal, wherein the one or more controllers are furtherconfigured to transmit a message related to the received sound signal toa terminal of the user.
 28. The robot of claim 21, wherein the rotationmechanism comprises a motor configured to rotate at least a portion ofthe robot to face different directions.